Elderly and visually impaired indoor activity monitoring based on Wi-Fi and Deep Hybrid convolutional neural network

A drop in physical activity and a deterioration in the capacity to undertake daily life activities are both connected with ageing and have negative effects on physical and mental health. An Elderly and Visually Impaired Human Activity Monitoring (EV-HAM) system that keeps tabs on a person’s routine and steps in if a change in behaviour or a crisis might greatly help an elderly person or a visually impaired. These individuals may find greater freedom with the help of an EVHAM system. As the backbone of human-centric applications like actively supported living and in-home monitoring for the elderly and visually impaired, an EVHAM system is essential. Big data-driven product design is flourishing in this age of 5G and the IoT. Recent advancements in processing power and software architectures have also contributed to the emergence and development of artificial intelligence (AI). In this context, the digital twin has emerged as a state-of-the-art technology that bridges the gap between the real and virtual worlds by evaluating data from several sensors using artificial intelligence algorithms. Although promising findings have been reported by Wi-Fi-based human activity identification techniques so far, their effectiveness is vulnerable to environmental variations. Using the environment-independent fingerprints generated from the Wi-Fi channel state information (CSI), we introduce Wi-Sense. This human activity identification system employs a Deep Hybrid convolutional neural network (DHCNN). The proposed system begins by collecting the CSI with a regular Wi-Fi Network Interface Controller. Wi-Sense uses the CSI ratio technique to lessen the effect of noise and the phase offset. The t- Distributed Stochastic Neighbor Embedding (t-SNE) is used to eliminate unnecessary data further. The data dimension is decreased, and the negative effects on the environment are eliminated in this process. The resulting spectrogram of the processed data exposes the activity’s micro-Doppler fingerprints as a function of both time and location. These spectrograms are put to use in the training of a DHCNN. Based on our findings, EVHAM can accurately identify these actions 99% of the time.

Statistical analysis of the global total shows that the proportion of the population that is 65 and up is growing at an alarming rate.The World Health Organization predicts that by 2050, 16% of the global population will be 65 or older.The Madrid World Action Plan on Ageing has highlighted numerous key approaches, including "making sure facilitating and supporting settings, " to welcome this demographic change and to prepare for the social reform it entails.Creating senior-friendly homes where people may remain safe, healthy, and independent for as long as possible is a top concern, which is why this directive was issued.For this reason, it is crucial to create reliable, inconspicuous, and geriatric-friendly in-home surveillance systems connected to a Health Information System (HIS) and programmed to summon help from a local emergency healthcare provider immediately.The foundation of every home monitoring system is human activity recognition (HAR).Human activity recognition (HAR) often involves making sense of sensor data to identify specific types of human behaviour.Data sensing, analysis, and classification components make up the bulk of a typical HAR system.The self-aware has a sensor

Related works
The computer models used in the deep learning method are built up of several layers of processing power.Thus, the inherent structure in complicated and extensive datasets may be automatically learned.Deep learning is commonly utilised to complete tasks in healthcare using data collected by mobile systems 15 .The authors of 16 explain how mobile devices and wearables paired with sensors are revolutionising health monitoring.There is a lot of potential for these gadgets to collect analytical data on many people, and deep learning is seen as a crucial part of analysing this new kind of data.However, there is still room for improvement in applying deep learning in healthcare sensing, primarily because of hardware limitations.Instead of analysing feature extractors from time-series sensor signals, the authors of 17 argue that Deep Convolutional Neural Networks (DCNN) can gain knowledge of the discriminant features instantaneously for activity classification by using an activity image constructed from signal sequences from accelerometers and gyroscopes.Compared to the state-of-the-art, their outcomes on three available datasets were superior.
While previous works have discovered that some iterative aspects can accomplish well in recognising one action but poorly for others, in 18 a Convolutional Neural Network (CNN) is used to perform the HAR job competently, extracting human activity highlights all with no technical experience (such as kitchen tasks or walking or running, walking, etc.).They highlight how a convolutional neural network (CNN) approach may successfully record variations of the same activity by feature extraction 19 that are both local to the signals and spectrum.This system is also evaluated on three publicly available datasets, with the best accuracy of 96.88% being achieved by the researchers.The convolutional neural network (CNN) utilised in 1,20 completes a HAR job with input from a single altimeter, allowing for constructing an angular velocity HAR on the mobile platform without needing specialised hardware.Using an Android app to capture tri-axial accelerometer data from participants, the findings reveal a pretty excellent accuracy of 93.8%.To preserve data variety, the studies were repeated with the device implanted in three other locations on the body.When compared to different prominent classifiers on the same dataset, such as the Support Vector Machine (SVM), CNN appears to have retrieved more valuable features than the manually computed input features of the Fast Fourier Transform (FFT) and Discrete Cosine Transform (DCT) used by the SVM 21 .
Previous research on these technologically advanced wheelchairs uses additional including heart rate, hypertension, glucose level, respiration rate, son and human actions, obstacle recognition, and movement 22,23 .Cushioned wheelchairs have pressure sensors that detect the user's changing body position.When a sensor recognises a potentially harmful position, an alert sounds.For a wheelchair with a pressure sensor mattress, sensor readings are optimised, and then many classifiers are used on the information to choose the optimal classifier.In addition, daily actions, including stair climbing, chair climbing, standing still, and jogging, were detected using a revised modular support vector machine 24 .Detecting walking behaviours in raw data through the wavelet transform and the K closest neighbour classifier 25 was also accomplished.Modern society places great value on human activity recognition (HAR).Yet, much work hasn't been done to tackle the difficulty of classifying time-series data.Still, human activity detection using integrated smartphone sensors is a promising new avenue of study 26 .The author 27 introduces the Spatiotemporal cRoss (STAR)-transformer to adequately express two cross-modal characteristics as an identifiable vector.Keyframes are first produced as global matrix tokens and skeletons as associated map tokens from the input video and skeleton sequence.After being compiled into multi-class tokens, these are fed through the STAR transformer.
In Garcia et al. 19 , a new HRC idea is presented, consisting of an HRC framework for controlling assembly operations carried out either in tandem with people and robots or independently by either group.When managing the setup, an HRC architecture that uses deep learning techniques needs only one piece of RGB camera data to create forecasts about the cooperative workplace and human behaviour 28 .The article 29 collects features for ages 60 and above manually for analysis.The used feature fusion technology recognises the activity accurately.Here, no automated techniques of data collection are done.In Liang et al. 30 , radar-based data collection is done for HAR.This radar is nonwearable and senses the activity by fixing radar in mobile robots from a specific distance.The central issue is if the person moves beyond a specific reach, then the accuracy of the data is not assured.Also, sometimes it may miss the situation of serious fall down.The article recognises human activities using sensors at all time intervals.This sensor uses some convolution operation for sensing.However, real-time solid monitoring is not ensured.To overcome the sensing problem, this research uses Wi-sense technology for monitoring HAR using video recording and image capturing techniques.Further, a Hybrid deep learning model is used to access and process the images.
Opportunistic scheduling is a technique used in communication networks to maximise the utilisation of available resources (such as bandwidth or relay nodes) by selecting the best opportunities for data transmission [31][32][33] .It employs Model Predictive Control techniques to optimise security responses to enhance networked systems' overall security and resilience in the face of cyber threats 34,35 .This optimisation is done while considering and mitigating the potential interference or mutual coupling effects between adjacent antenna elements 36 .This method analyses the similarity of paths and uses matrix algebra as part of its computational approach.Link prediction in directed networks is relevant in various fields [37][38][39] .It suggests that deep learning techniques are being applied to mitigate security risks and improve the overall security posture of IoT ecosystems.It is crucial as IoT devices become more integrated into our daily lives and various industries [40][41][42] .This research is relevant in modern wireless communication networks and the increasing demand for reliable and secure data transmission in various applications 43,44 .The system's purpose is not just image analysis but also clinical evaluation, which means it aims to provide medical assessments and diagnoses based on these images 45,46 .Deep neural networks are employed to perform the task of matching or tracking features within soft tissues.Deep learning is a subset of artificial intelligence that excels at recognising patterns in data, making it suitable for tasks like image analysis and tracking [47][48][49] .This system utilises a specific microcontroller (STM32) for precise control of laser pulses, and it incorporates a Photomultiplier Tube (PMT) with adjustable gain to enhance the sensitivity of echo detection 50,51 .The Generalized buffer algorithm is a versatile approach to managing and controlling data or processes, providing a flexible solution that can be adapted to different situations where buffering is necessary for efficient operation 52,53 .The technology described involves the development of a framework that employs machine learning to optimise the entire communication process seamlessly within a system that integrates fibreoptic and terahertz communication technologies 54,55 .This information could be significant for neuroscience and clinical applications, as it may provide insights into the potential therapeutic or research applications of tACS for modulating neural activity in deeper brain structures [56][57][58] .It uses a structured hierarchical semantic network to represent and organise technological concepts or domains.Then, it employs dual-link prediction techniques to identify and assess potential connections or relationships within this network 59 .

Proposed methodology
At the outset, a broad range of sensors captures raw acoustic inputs (smartphones, Wi-Fi, watches, Bluetooth, sound etc.). Figure 1 shows an overview of a popular Pattern Recognition approach that may be used to deal with HAM.Second, when using deep learning techniques, attributes are derived from the readings.Some examples of these parameters include the average, the range, the DC, and the intensity.Lastly, such features are utilized as Vol:.(1234567890

Wearables and sensors
Rest, moving, lying down, climbing, running, and jumping are all considered physical pursuits.More and more research is linking regular physical exercise to a lower risk of numerous chronic illnesses, including overweight, diabetes, and cardiovascular events, as well as improved mental health of the elderly.There is a wealth of information, such as activity length and intensity, captured by wearable devices during these activities, which might provide light on a person's daily routine and health status.Dedicated solutions like Fitbit, for instance, may measure and record energy expenditure on smart devices.This can be a significant first step in monitoring physical activity and avoiding the onset of chronic illnesses.More than that, studies have found a link between how people get around (car, foot, bike, and public transportation) and weight gain.Users can benefit from more exercise and an improved understanding of their diseases if doctors can access data on their daily movement and transportation habits.Therefore, one of the most pervasive uses of HAR technologies has the chance to benefit significantly from incorporating smartwatches into the fitness and leisure industries.Mobile phones, wristbands, spectacles, bands, gloves, bracelets, pendants, sneakers, and E-tattoos are just a few commercially available or prototype smart gadgets currently under demand.Overall, these gadgets are designed to be worn by a person from head to toe.Miniaturizing and lightening wearables have been made possible by developments in micro-electro-mechanical system new tech (light microscope gadgets, encompassing a centralized system such as a microcomputer and involved so that engages with the environment, such as microelectronics), which in turn reduces the barrier to entry for the widespread adoption of smartwatches and Networking technologies.The goal of HAR is to better human psychology so that computers can more intelligently anticipate and meet the needs of their users.To use the formal terminology, let's say the user is engaging in activities that fall within the category of "activity set X. " The number of activities is denoted by N .A series of sensor readings records the action in, The sequence activity is identified by the deep learning model, The actual activity from the dataset is referred as, (1) www.nature.com/scientificreports/

Wi-Fi sensing module
The goal of the radio frequency (RF) section is to glean information about the channel's time-dependent qualities brought on by human action.In this context, environmental transmission of RF signals occurs at predetermined frequencies and phases.The ambient receivers will then record these transmissions.The captured signals provide insight into the RF channels and the ways in which they change over time as a result of human activity.By analyzing the data collected, scientists may learn what kinds of activities are taking place in a certain area.The frequency domain and the time domain are the two most used measurement settings for wideband channels, respectively.Stepped frequency sweeping is used to take measurements in the frequency domain of a channel at a variety of tones within a specified bandwidth.The Vector Network Analyzer (VNA) calculates the K l variable to determine the channel's complicated sound quality.Depending on the value of the K l parameter, the network's transfer function is calculated as, Using the VNA's trigger resonant frequency, we get K l t, f ′ ∝ K l (t, f ′ ) where f ′ ' is one of the VNA's trigger frequencies.
Received Signal Strength Indicator (RSSI) and Channel State Information (CSI) are typically the two measurements that are used in Wi-Fi-based HAR approaches to obtain insight into the channel and the influence of human activities.However, if we collect the channel impulse response (CIR), we can supply far more information than we could with either RSSI or CSI alone.CIR can provide us with facts on the RF channel and the changes that occur to it as a result of the atmosphere and the actions of humans.Each CIR element data is stored on the wireless channel modeling from the transmitter to the receiver.This channel may be described as "the path that-strategic from the to the receiver." where n represents the total number of observations, N represents the number of constructive interferences, A k and d represent the intensity and latency of the N-th scatterer respectively, f c represents the centre frequency, and θ k represents the arbitrary beginning stage.If this were the case, the spanned rate would be, The noise present in the CSI data is first efficiently reduced by the data processing module of the Wi-Fi module.Following that, the spectrogram approach is used to extract time-variant micro-Doppler signals.The time-variant Doppler characteristics of the RF channel are shown as pictures on the spectrogram.These characteristics are generated by both stationary and moving objects.Because only moving things in the environment may produce the Doppler effect, we know dispersed signal components received from stationary objects will not experience any Doppler shift.This is because the Doppler effect is only created by moving things.Therefore, we contend that the fluctuations in the micro-Doppler signatures are caused by the moving item.As a result, the functionality of the Wi-Fi module will not be impacted by the positioning of various static objects.The categorization module receives spectrogram pictures that have been saved in JPEG format after being processed.The Wi-Fi categorization module is essentially a Hybrid CNN that analyses user behavior to categorize the various tasks that the user carries out.

Hybrid convolutional neural network
A CNN, an LSTM, attention, and a dense network comprise the human activity recognition network (Deep Hybrid CNN(DHCNN)), shown in Fig. 2. A dense network is used to recognize the behaviors of the subject.This network acts as a classifier by employing the residual concatenation for classification, which is then followed by CNN, long short-term memory, and the attention model.The suggested CNN-LSTM structure with self-attention model is depicted in Fig. 2.This framework makes use of CNN layers to dynamically extract attributes from information, and it also combines LSTMs and an attention layer to assist with sequence predictions.CNN-LSTMs equipped with self-attention are utilized in the production of textual files from captured images as well as the solution of difficulties involving the predicting of optical time series.This architecture is useful for addressing issues that call for the development of periodic output or that entail time and space input structures.In this study, a deep CNN-LSTM model that incorporates self-attention is proposed as a means of improving recognition accuracy. (5) w k,l .I k+a,l+b + B a k,l represents the activation function.w k,l is the weight function.I k+a,l+b denotes the previous neuron and B denotes the bias function.In the experimental that we ran, the deep networks used rectified linear units (ReLU) to compute the local features.The non-linear variable was represented using the following notation: In general, it has been found through examination that the more convolution kernels that are utilised, the more concealed characteristics of the input samples may be retrieved.One convolutional layer is present in the CNN-LSTM model that incorporates self-attention.In this convolution layer, there are a total of 16 kernels that are utilised for the feature extraction.The size of each convolutional kernel ranges from 1 to 5. At this point in time, LSTM networks function wonderfully across a broad spectrum of temporal schemes.The Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) is a form of RNN that is gaining more and more prominence.RNNs are able to make a forecast of the current time output based on the DL method's reliance on prior information.However, because to the dissolving gradient issue, RNN systems can only recognize data for a limited amount of time at a time.If gradient are not permitted to flow deeply whereas the back-propagation approach is being used for deep learning, this will result in the gradients being buried.The RNN group was presented with a novel neuron that they came up with and named LSTM to solve the issue of long-term reliance.
In order to efficiently extract the temporal characteristics included within the sequence data, the authors of this paper begin by running the input data through a two-layer LSTM network.In the LSTM layer, there are a total of 64 memory cells.The action of each LSTM unit can be manipulated by using the given equations, which involves delivering a variety of inputs to a variety of gates, including input gates, exit gates, and entrance inputs.

Algorithm 1 (DHCNN)
Step 1 Gather and preprocess your dataset of human activity data.This dataset should include both input data (e.g., images, sequences) and corresponding labels (e.g., activity categories).
Step 2 Import the necessary libraries and deep learning frameworks such as TensorFlow, PyTorch, or Keras.www.nature.com/scientificreports/ Step 5 Create and compile the DHCNN model using your chosen deep learning framework.Compile it with an appropriate optimizer, loss function, and evaluation metrics.
Step 7 Train the DHCNN model on the training data using the fit method.Specify the number of epochs, batch size, and validation data.
Step 8 Evaluate the trained model on the test set to assess its performance.Calculate metrics like accuracy, precision, recall, and F1-score.
Step 9 Fine-tune the model by adjusting hyperparameters, architecture, and regularization techniques to improve performance.
Step 10 Once you are satisfied with the model's performance, deploy it for inference on new, unseen data.You can deploy it as part of a larger application or system.
Step 11 Continuously monitor the model's performance and retrain it with new data or fine-tuning as necessary to maintain its accuracy.

Ethical approval
None of the authors' experimented with human subjects or animals during this research.

Experimental setup
The scikit learn and Keras libraries on top of a tensor flow backend are used to train the model's classification algorithms.The training and testing data is split as 80% and 20%.

Dataset
The activity of the various areas of the human body causes changes in the reflectors of the wireless signals, which in turn results in variations in the CSI.People's behaviour may be detected by conducting an analysis of the data streams produced by CSIs for various activities and correlating those streams of data to models that have been stored.This is accomplished by the extraction of features from CSI data streams and the application of machine learning techniques in the construction of models and classifiers.
It is necessary to have a dataset on hand in order to construct and train a model for human activity recognition.We discovered two datasets that are accessible to the public: 60,61 .Both sets of data were gathered by utilising the Linux 802.11nCSI Tool, and each of the transmitter and receiver routers had three antennae.Despite this, we made the decision not to utilise them because the hardware is now outdated and cannot be purchased elsewhere.In addition to this, the data collection was done in sequences, each of which consisted of only a single action.
The series can be relatively lengthy in terms of time, but they do not include transitions between the many acts that take place, nor do they include actions that change rapidly and often over brief intervals of time.Due to these restrictions, a realistic depiction of human behaviour and data from the actual world is not possible.Last but not least, the setting in which the data is collected is one that is highly regulated and resembles a laboratory.The publicly available dataset is used in our model is shown in the Table 1 and the amplitude is shown in the Fig. 3.The total train and test data in this dataset is 1,801,440.
The dataset human-activity-recognition-with-smartphones contains the activities of laying Standing, Sitting, Walking, walking upstairs, Walking downstairs.The number of data is predicted from the Fig. 4. The train and test data count in this dataset is 7352.www.nature.com/scientificreports/

Data cleaning and reduction
Dataset has been cleaned by removing duplicates, invalid values, and ensuring there is an even distribution of data for each activity.Data reduction was accomplished using well-known methods including principal component analysis and t-SNE.In order to select the most useful characteristics from a dataset, principal component analysis (PCA) can be performed to reduce the dimensionality of the original features.The principal component analysis (PCA) is an unsupervised technique (data without labelling) that uses the correlation between attributes to identify the patterns in the data.PCA is used to create a lower-dimensional subspace of features while still retaining the important features of the original feature set.Linear combinations of the features already present in the data set are used to create primary components, which are then used to describe the original data set.On the other hand, if a non-linear high-dimensional feature dataset is required to model the data, then modelling it with parameters generated by doing PCA on the data set will yield a very bad model, leading to less accurate recognition results.When this restriction is applied, data reduction using the t-SNE approach is a viable solution.

Result and discussion
Several experiments have been carried out in order to evaluate the effectiveness of the improved CCN model that was provided earlier.Precision, Accuracy, Recall, and F-Measure are the parameters that are used to compare the result.These characteristics are determined: It's a vote of confidence in the method being used to assess the HAR.The proportion of correctly categorized activities (identified) to the total number of classified activities is depicted in Figs. 5 and 6 and stands for accuracy.Sum of samples for which an identification was made is the recall, and percentage of correct identifications is the accuracy, both of which are stated in ( 16) and ( 17) for the HAM.Inadequate classifiers often have better classification accuracy, making this metric unreliable.Consequently, in addition to this criterion, another conventional factor known as F-measure is used.Measures of accuracy and recall are included in the F1 result, equating to the confidence level in the system's ability to detect the agent's actions.The F-measure is used to determine the reliability of samples of receive updates.
In this case, T pos stands for True Positives, T neg stands for T neg , F pos stands for F pos , and F neg is for False Nega- tives.For each dataset, accuracy was evaluated to determine the actual quality of the DHCNN (i.e., taking into account the entire collection of classes), and F-Measure, Precision, and Recall were computed to provide a more specific insight of how the DHCNN behaves when distinguishing a specific class.The Table 2 shows the results of performance measure of EV-HAM method.Graphical representation of performance evaluation shown in Fig. 7.
DHCNN algorithm confusion matrix is shown in Figs. 8 and 9.As can be seen in the blue diagonal cell of the confusion matrix the DHCNN classifier has an overall accuracy of 99%.The number of properly identified activities is represented in blue cells of the confusion matrix.www.nature.com/scientificreports/ As an example, the top diagonal cell displays the number of correctly labelled walking scenarios.The memory rates for laying (100%) and standing (100%) are relatively high.The confusion matrix for the dataset humanactivity-recognition-with-smartphones is shown in the Fig. 9.
Figure 10 also displays the ROC curves for every class in the categorization, which helps to comprehend the model's set of metrics.The ROC curve is a more reliable indicator of classification accuracy since it is unaffected by the uneven distribution of class labels in the sample.
The classifier's true positive rate (recall) is compared to its false positive rate (sensitivity) in this graphic (fallout).If a classifier were to randomly or poorly estimate its classes, its ROC curve would look like the diagonal dashed line in the graph.A further separation from the dashed line indicates that the classifier is more effective.All curves in a perfect classifier would meet in the upper left corner.Because of this less-than-ideal identification conclusion, the curves approached the edge but did not meet it, with the exception of the Laying curve, which had the highest individual classification accuracy (99%).
Our EV-HAM system takes advantage of a cloud situation via a wirelessly connected, wearable sensor, and a DHCNN model, and the resulting architecture meets all of the system's needs.Here we conducted several tests to test number of tests to test out the efficacy of deep learning methods.We test these models on various proposed datasets containing image sequences and a variety of proposed datasets containing image sequences, and we report on their actual quality.With respect to accuracy across all frame sequences, our suggested DHCNN model outperformed all others.It uses a structured hierarchical semantic network to represent and organise

Step 3 Step 4
Define the architecture of the DHCNN model, including the following components: o Input layer: Define the input shape based on your data.o CNN layers: Specify the number of convolutional layers, filter sizes, activation functions, etc. o LSTM layers: Specify the number of LSTM layers, the number of memory cells, return sequences if needed.o Attention mechanism: Define the self-attention mechanism.o Classifier (Dense network): Specify the dense layers for classification.Create Individual Components • Define functions to create individual components of the model: o create_cnn_layers: Define CNN layers.o create_lstm_layers: Define LSTM layers.o apply_self_attention: Define the self-attention mechanism.o create_dense_classifier: Define the dense layers for classification.

Figure 3 .
Figure 3.The amplitude collected from the dataset.

Figure 7 .
Figure 7. Graphical representation of performance evaluation of EV-HAM.

Figure 8 .
Figure 8. Confusion matrix of the publicly available dataset.

Figure 10 .
Figure 10.ROC of our Proposed model EV-HAM.

Table 1 .
Number of activities from the public dataset.
Vol:.(1234567890) Scientific Reports | (2023) 13:22470 | https://doi.org/10.1038/s41598-023-48860-5 Additionally, a search strategy has been conducted across the following hyper-parameters for each framework, Searching (0.00001 to 0.004) with a 0.0005-point increase in the Learning Rate Number of samples: 16,32,64, Search range of training iteration, from 100 to 400 with a 100-step increment.The Adam optimizer is used in the proposed hybrid CNN model.Additionally, a search strategy has been conducted across the following hyper-parameters for each framework, Searching (0.00001 to 0.004) with a 0.0005-point increase in the Learning Rate Number of samples: 16,32,64, Search range of training iteration, from 100 to 400 with a 100-step increment.The Adam optimizer is used in the proposed hybrid CNN model.

Table 2 .
Performance measure of EV-HAM.

Table 3 .
Performance comparison with baseline model.